-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Workflow Run RO-crate format #39
base: master
Are you sure you want to change the base?
Conversation
add encodingFormat for nextflow.config
feat: add wrroc to valid formats
* fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]>
* fix #7 Signed-off-by: fbartusch <[email protected]>
* feat: add README to create * feat: ignore vscode * fix: make getIntermediateOutputFiles work again (#18) (#19) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to json * feat: check first if readme exists * Add readme to hasPart Signed-off-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]>
* Add getEncodingFormat function that return the encoding format for a file * handle YAML files manually Signed-off-by: fbartusch <[email protected]>
* implements #1 Signed-off-by: fbartusch <[email protected]>
Iss7 directory type
* start with metaYaml imports * merge dev-wrroc into metaYaml (#23) * add encodingFormat for nextflow.config * add encodingFormat for main.nf * feat: add wrroc to valid formats * fix: make getIntermediateOutputFiles work again (#18) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to crate (#14) * feat: add README to create * feat: ignore vscode * fix: make getIntermediateOutputFiles work again (#18) (#19) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to json * feat: check first if readme exists * Add readme to hasPart Signed-off-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]> * WIP * only add from meta if meta exists * remove usage from ext args * add module name to id --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]>
@bentsherman maybe you can have a look here :) |
Have you got an example RO-Crate generated with this version of the plugin? |
ro-crate-metadata.json |
Is this superseding #33 now? |
I ran
Since I don't have the original crate I'm wondering, do all relative paths correspond to existing files in the crate, e.g., is there a
I tried to run the pipeline following the instructions at the repo above. I had to manually change the plugin's version (to 1.1.0-DEV) after installing it, otherwise Nextflow downloads the
Despite that, the run went on and I got an RO-Crate, but without the |
@simleo I will try and answer to the issues you had :)
Yes this path is copied like this to the folder in which the json can be found. But that might not be the most elegant solution as the input files could also be put in a folder called input or something. What do you think?
Yes the installation of the plugin needs to be fixed. I always run Seems like something is off with your inputs. Did you download the files as described in the README of the pipeline? |
It's not that important, what matters is that there are no clashes between files due to their names.
Yes, and I put them in the root directory of the
The command I ran is:
|
Then I think you might need to update to a more recent Nextflow version as I only tested this for versions above 24.04. But you should be able to test the plugin with any pipeline - maybe running an nf-core pipeline that is better maintained is more reliable! |
@bentsherman I would appreciate any feedback on this so we can get this finished up this month? As soon as people start using the plugin I guess more requests for changes / improvements / updates will come in. |
The metadata file looks OK. One thing I suggest changing is Regarding the crate structure, it looks fine from your screenshot, though some files added to the crate are not listed in the metadata (e.g. However, I still cannot reproduce your result after upgrading Nextflow to 24.10.3. The output directory is missing several files and directories including
(with respect to Nextflow 23.10.0 the |
Can you try changing the config file to something like: (mainly updating the version of the plugin)
|
Thanks @famosab, I finally managed to install the right version of the plugin. The run finished with no errors and no warnings, and I got a valid RO-Crate. It's looking pretty good already, but some things need to be fixed. I think the most important one is
|
Taking a look this afternoon. Expect some minor edits soon |
Signed-off-by: Ben Sherman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some minor cleanup so far. The render()
function is pretty long, so I'm going to see if I can move some code into helper functions to make it easier to read at a high-level. Please refrain from making edits for now as I work through the code. I left some comments/questions in the meantime.
final configMap = session.config | ||
|
||
// Set RO-Crate Root and workdir | ||
this.crateRootDir = Path.of(params['outdir'].toString()).toAbsolutePath() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the parent directory of the ro-crate-metadata.json
instead, because params.outdir
is not universal to all pipelines
// Add intermediate input files (produced by workflow tasks and consumed by other tasks) | ||
workflowInputs.addAll(getIntermediateInputFiles(tasks, workflowInputs)) | ||
final workflowInputMapping = getWorkflowInputMapping(workflowInputs) | ||
|
||
// Add intermediate output files (produced by workflow tasks and consumed by other tasks) | ||
workflowOutputs.putAll(getIntermediateOutputFiles(tasks, workflowOutputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workflowInputs
and workflowOutputs
are meant to contain only the pipeline inputs/outputs, not the intermediate outputs. I would keep them separate.
// Copy workflow input files into RO-Crate | ||
workflowInputMapping.each { source, dest -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you are copying the intermediate files into the RO-crate? I don't think this is feasible in general. I think it would be better to save only a record of the task inputs/outputs with a checksum. That is what the BCO does at least.
// Copy workflow output files into RO-Crate | ||
workflowOutputs.each { source, dest -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting intermediate files aside, the workflow outputs should already be present in the output directory, so there should be no need to copy them. The most I would do is verify that each workflow output actually resides in the output directory and warn the user if it doesn't.
// Copy workflow into crate directory | ||
Files.copy(scriptFile, crateRootDir.resolve(scriptFile.getFileName()), StandardCopyOption.REPLACE_EXISTING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simleo you mentioned copying all of the pipeline code into the crate, but is this really necessary? If the crate specifies a git repository, revision, and main script path, the user can reproduce the pipeline code at any time.
// license information | ||
final license = [ | ||
"@id" : manifest.license, | ||
"@type": "CreativeWork" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextflow 24.10 added the manifest.license
config option. Since the latest version of nf-prov requires 24.10 , I went ahead and refactored this bit to use this option instead of a custom option
We worked on a first version of the plugin which is able to render valid RO-crates for any workflow run.
Happy to receive feedback to get this finished up :)
Continues #19 and #33.